This study is the analysis of the three following Mycobacterium smegmatis samples:
Alexandre Stella and Hung Le prepared 1 mg of digested peptides (Urea lysis followed by in solution digestion) and subjected 500ug to TiO2 enrichment. The samples prior and after phospho-enrichment were injected onto the Q-Exactive Plus in technical duplicate. We work with 3 biological replicates (independent transfections).
The raw files of the proteomic data set are in RAW/Proteome_MQ041217/. We work with the Max Quant table proteinGroups.txt.
I remove the “REV_” and “CON_”, and I keep only the proteins identified/quantified with a minimum of 2 unique peptides.
There are 3563 proteins identified in the study.
I export the protein IDs for manual retrieving using https://www.uniprot.org/uploadlists/ (20180117).
We searched the data with 2 proteome annotations from Uniprot (strain ATCC 700084 / mc(2)155 Uniprot IDs UP000000757 and UP000006158 with 6602 and 6585 entries, respectively). I choose a unique ID to be able to match the proteomic to the phospho-proteomic data later on.
The new column “Proteome” contains the information of from which proteome comes the ID. The “FALSE” are the “Pknbtub” sequences that we added and a “Biognosys7”.
| Var1 | Freq |
|---|---|
| 44 | |
| FALSE | 2 |
| UP000000757: Chromosome | 583 |
| UP000000757: Chromosome; UP000006158: Chromosome | 475 |
| UP000001584: Chromosome | 1 |
| UP000006158: Chromosome | 2458 |
To simplify the table, I keep only one gene name if possible. I decide to first keep the gene name. If this is not available, I keep the“MSMEG” ID, if there is none I keep the “MSMEI”.
I manually check that there is no ID ambiguity in the protein names attribution.
I use the LFQ values for the quantification.
In the following boxplot:
## Using Protein.IDs as id variables
## Using Protein.IDs as id variables
Reproducibility of the MS runs:
I save the protein table as OutputTables/NormIntProt_20190127.txt.
Calculate the mean of the technical repeats:
Verification that the mutation is as expected: I plot the mean signal of the peptides containing the mutation (sequence “”) that are detected with MSMS.
## Using Protein.IDs as id variables
## Using ProteinIDs as id variables
## quartz_off_screen
## 2
I replace with 1% quantile from each condition
I replace missing values when there is one or no value detected across the 3 experiments in one condition (conditions being L, K, P).
I perform a Welch two-sided t-test followed by a BH correction of the pvalue:
## quartz_off_screen
## 2
## quartz_off_screen
## 2
I calculate a normalisation factor for each protein in each run. The vertical red line indicates the median intensity of PknB.
I remove the CON_ and REV_.
There are 3798 phosphorylation sites identified in the study (from 1339 proteins).
I keep only the sites with 75% localisation probability (above or equal):
There are 2256 phosphorylation sites identified in the study (from 1175 proteins).
I export the protein IDs for manual retrieving using https://www.uniprot.org/uploadlists/ (20180117).
We searched the data with 2 proteome annotations from Uniprot (see document ProteomesComparison). I choose a unique ID to be able to match the proteomic to the phospho-proteomic data later on.
## No id variables; using all as measure variables
## No id variables; using all as measure variables
Each protein is normalised with a normalisation factor calculated in the proteomics data set.
I remove the CON_ and REV_ in the phospho table.
There are 2256 rows in the phospho table.
I finely match the protein ID from the phospho data set to the proteome.
We match 94.33% of the sites to the corresponding protein value in the proteome.
## No id variables; using all as measure variables
## quartz_off_screen
## 2
## quartz_off_screen
## 2
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 6.2088 3.4700 0.99933 0.90857 0.63822 0.56800
## Proportion of Variance 0.7176 0.2241 0.01859 0.01537 0.00758 0.00601
## Cumulative Proportion 0.7176 0.9417 0.96028 0.97564 0.98323 0.98923
## PC7 PC8 PC9
## Standard deviation 0.52518 0.4334 0.33884
## Proportion of Variance 0.00513 0.0035 0.00214
## Cumulative Proportion 0.99437 0.9979 1.00000
## quartz_off_screen
## 2
I add the information in the output table.
Selective replacement when there is only one or zero measurement for a site in a given condition (K, L, P).
Replacement with 1% quantile of all the conditions, to avoid a bias due to the increase in general intensity when the kinase is over-expressed.
Welch two-sided t-test followed by BH correction of the pvalue.
##
## FALSE TRUE
## 2254 2
##
## FALSE TRUE
## 871 1385
##
## FALSE TRUE
## 870 1386
Volcano with sites of interest:
## quartz_off_screen
## 2
Volcano for the paper (with the known substrates of PknB):
## quartz_off_screen
## 2
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggrepel_0.8.0 gplots_3.0.1.1 knitr_1.21 corrplot_0.84
## [5] reshape2_1.4.3 ggplot2_3.1.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.0 RColorBrewer_1.1-2 highr_0.7
## [4] pillar_1.3.1 compiler_3.5.2 plyr_1.8.4
## [7] bindr_0.1.1 bitops_1.0-6 tools_3.5.2
## [10] digest_0.6.18 evaluate_0.12 tibble_2.0.1
## [13] gtable_0.2.0 pkgconfig_2.0.2 rlang_0.3.1
## [16] yaml_2.2.0 xfun_0.4 bindrcpp_0.2.2
## [19] withr_2.1.2 dplyr_0.7.8 stringr_1.3.1
## [22] caTools_1.17.1.1 gtools_3.8.1 grid_3.5.2
## [25] tidyselect_0.2.5 glue_1.3.0 R6_2.3.0
## [28] rmarkdown_1.11 gdata_2.18.0 purrr_0.2.5
## [31] magrittr_1.5 scales_1.0.0 htmltools_0.3.6
## [34] assertthat_0.2.0 colorspace_1.4-0 labeling_0.3
## [37] KernSmooth_2.23-15 stringi_1.2.4 lazyeval_0.2.1
## [40] munsell_0.5.0 crayon_1.3.4